There Is No Preview Available For This Item
This item does not appear to have any files that can be experienced on Archive.org.
Please download files in this item to interact with them on your computer.
Show all files
An ad hoc data source is any semistructured data source for which useful data analysis and transformation tools are not readily available. Such data must be queried, transformed and displayed by systems administrators, computational biologists, financial analysts and hosts of others on a regular basis. PADS is a domain-specific language extension for C and O'Caml that allows programmers to specify the formats of ad hoc data sources using a set of type declarations. The PADS compiler generates a collection of useful tools from these declarations including a parser, printer, data validator, formatter, error profiler, xml converter and query engine.
Programmers may use PADS by writing a description by hand or by asking the system to infer a pads description directly from example data. The multi-phase inference algorithm operates by inferring a candidate format and then optimizing it relative to an information-theorectic scoring function. Inferred descriptions may be automatically pushed through PADS compiler to generate fully functional tools with no human intervention. The entire process takes just seconds to complete on 1K of example data, and has the potential to greatly improve the productivity of data analysis.
This ongoing research is a collaboration between AT&T research and Princeton University. It involves Mary Fernandez, Kathleen Fisher, Yitzhak Mandelbaum, David Walker, Qian Xi, and Kenny Zhu. More information, software and research papers are available at www.padsproj.org.